Uso de uma ontologia de lugar urbano para reconhecimento e extração de evidências geoespaciais na Web
نویسنده
چکیده
Queries that include at least one geographic-related term, such as place names and natural features, are currently a significant subset of the queries that are submitted to search engines. Interest on local information on the Web (local search) is increasing daily, and for this kind of search, the Web is a vast repository of local geographic information. However, traditional search engines have limitation on the recognition of the geographic scope of Web pages. Pages that refer to the same place, but using alternative names, probably will not be retrieved together. Besides, in many situations the geographic context is implicit in the pages, but can be inferred by the existence, for instance, of a telephone number or postal code. In order to propose a solution for these problems, this thesis focuses on the local Web, presenting an approach based on an ontology of urban place, which allows for the recognition, extraction, and geocoding of geospatial evidences with local characteristics, such as urban addresses, postal codes, and telephone numbers as found in Web pages. The geospatial evidences are implicitly related to places, so that the contents of a page, or parts of it, can be correlated to an urban geographic location. Thus, search engines can, for instance, use such information to retrieve pages that are related to services and activities in a certain location or close to it. Therefore, the main contributions of this thesis are (1) the characterization of urban addresses contained in Web pages as sources of geospatial evidences and definition of patterns for their recognition and extraction, (2) the definition of OnLocus, an ontology of urban place that helps in the process recognizing and extracting geospatial evidences from Web pages, (3) the creation of a database for recognition of Brazilian places, based on OnLocus, (4) the proposal of a strategy for geographic categorization of a Web page, or parts of it, within a country's territorial divisions, and (5) the evaluation of the quantitative and qualitative characteristics of urban addresses that are found in the pages of the Brazilian Web. All of these contributions have been validated through experimentation, using real data from a set of 4 million Web pages. As an additional result, it was possible to obtain a snapshot of the usage of addresses in pages from the Brazilian Web and, consequently, to better understand how to geocode them. Results of this thesis open a range of perspectives for new types of applications, such as, for instance, the use of navigational links based on geographic location, geographic classification of Web pages, Web-based geospatial data mining, and semantic annotation of pages.
منابع مشابه
Extração Automática de Termos Candidatos às Ontologias: um Estudo de Caso no Domínio da Hemoterapia
This paper describes a case study conducted within the domain of blood transfusion aiming at non-exhaustively extraction of candidate terms for an ontology of human blood. The process involved both the construction of a corpus and its automatic processing, and the retrieval of specialized terms. As our main result, we have obtained candidate medical terms to be used in a ontology of blood trans...
متن کاملUm Método para Determinar a Equivalência Semântica entre Esquemas GML
One of the difficulties faced by Geographic Information Systems (GIS) is exchanging information among distinct systems. The Geography Markup Language (GML) specifies a set of rules for geographic information transport and storage. However, different GIS can use different GML schemas, generating interoperability problems at the semantic level. This paper proposes a semi-automatic method for dete...
متن کاملGeração de features para resolução de correferência: Pessoa, Local e Organização (Feature Generation for Coreference Resolution: Person, Location and Organization) [in Portuguese]
This work aims at resolving coreference in Portuguese, focusing on categories of named entities Person, Location and Organization. The proposed method uses supervised learning. To this end, the use of features that assist in the correct classification of named entities is critical. The construction and refinement of these features are of great relevance to his task. The performance of many othe...
متن کاملIdentificação da Autoria de Manuscritos com Base em Atributos Genéticos e Genéricos da Escrita
Resumo: A grafoscopia é uma área da ciência forense dedicada, entre outros fins, à identificação e a verificação da autoria de manuscritos, sejam eles contemporâneos ou antigos. Por se tratar de uma área onde predomina a subjetividade no uso da técnica pericial ou grafometria, a mesma tem se tornado campo de interesse da computação, na busca de soluções para a padronização e auxílio de tais pro...
متن کاملTécnicas em Processamento e Análise de Documentos Manuscritos
This work aims to present the main modules of a typical Handwriting Document Processing and Analysis System (HDPAS): data acquisition, preprocessing, segmentation, feature extraction and recognition. First, the relevant aspects of each HDPAS process are given, different types of documents, filtering and segmentation techniques applied to extract the interested data. Then, the main Hidden Markov...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006